1PyTorch 实现中的一些常用技巧

您所在的位置：网站首页 › pytorch l1正则化 › 1PyTorch 实现中的一些常用技巧

1PyTorch 实现中的一些常用技巧

2024-07-18 03:59:09| 来源: 网络整理| 查看: 265

模型统计数据（Model Statistics) 统计参数总数量 num_params = sum(param.numel() for param in model.parameters()) 参数正则化（Weight Regularization）以前的方法 L2/L1 Regularization

机器学习中几乎都可以看到损失函数后面会添加一个额外项，常用的额外项一般有两种，称作**L1正则化和L2正则化，或者L1范数和L2范数**。

L1 正则化和 L2 正则化可以看做是损失函数的惩罚项。所谓 “惩罚” 是指对损失函数中的某些参数做一些限制。

L1 正则化是指权值向量 w中各个元素的**绝对值之和**，通常表示为 ${||w||}_1$ L2 正则化是指权值向量 w中各个元素的**平方和然后再求平方根**，通常表示为{||w||}_2$

下面是L1正则化和L2正则化的作用，这些表述可以在很多文章中找到。

L1 正则化可以产生稀疏权值矩阵，即产生一个稀疏模型，可以用于特征选择 L2 正则化可以防止模型过拟合（overfitting）；一定程度上，L1也可以防止过拟合

L2 正则化的实现方法：

reg = 1e-6 l2_loss = Variable(torch.FloatTensor(1), requires_grad=True) for name, param in model.named_parameters(): if \'bias\' not in name: l2_loss = l2_loss (0.5 * reg * torch.sum(torch.pow(W, 2)))

L1 正则化的实现方法：

reg = 1e-6 l1_loss = Variable(torch.FloatTensor(1), requires_grad=True) for name, param in model.named_parameters(): if \'bias\' not in name: l1_loss = l1_loss (reg * torch.sum(torch.abs(W))) Orthogonal Regularization reg = 1e-6 orth_loss = Variable(torch.FloatTensor(1), requires_grad=True) for name, param in model.named_parameters(): if \'bias\' not in name: param_flat = param.view(param.shape[0], -1) sym = torch.mm(param_flat, torch.t(param_flat)) sym -= Variable(torch.eye(param_flat.shape[0])) orth_loss = orth_loss (reg * sym.sum()) Max Norm Constraint

简单来讲就是对 w 的指直接进行限制。

def max_norm(model, max_val=3, eps=1e-8): for name, param in model.named_parameters(): if \'bias\' not in name: norm = param.norm(2, dim=0, keepdim=True) desired = torch.clamp(norm, 0, max_val) param = param * (desired / (eps norm)) L2正则

在pytorch中进行L2正则化，最直接的方式可以直接用优化器自带的weight_decay选项指定权值衰减率，相当于L2正则化中的λ

optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9,weight_decay=1e-5) lambda = torch.tensor(1.) l2_reg = torch.tensor(0.) for param in model.parameters(): l2_reg += torch.norm(param) loss += lambda * l2_reg

此外，优化器还支持一种称之为Per-parameter options的操作，就是对每一个参数进行特定的指定，以满足更为细致的要求。做法也很简单，与上面不同的，我们传入的待优化变量不是一个Variable而是一个可迭代的字典，字典中必须有params的key，用于指定待优化变量，而其他的key需要匹配优化器本身的参数设置。

optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9) weight_p, bias_p = [],[] for name, p in model.named_parameters(): if 'bias' in name: bias_p += [p] else: weight_p += [p] # 这里的model中每个参数的名字都是系统自动命名的，只要是权值都是带有weight，偏置都带有bias， # 因此可以通过名字判断属性，这个和tensorflow不同，tensorflow是可以用户自己定义名字的，当然也会系统自己定义。 optim.SGD([ {'params': weight_p, 'weight_decay':1e-5}, {'params': bias_p, 'weight_decay':0} ], lr=1e-2, momentum=0.9) L1正则化 criterion= nn.CrossEntropyLoss() classify_loss = criterion(input=out, target=batch_train_label) lambda = torch.tensor(1.) l1_reg = torch.tensor(0.) for param in model.parameters(): l1_reg += torch.sum(torch.abs(param)) loss =classify_loss+ lambda * l1_reg 定义正则化类 # 检查GPU是否可用 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # device='cuda' print("-----device:{}".format(device)) print("-----Pytorch version:{}".format(torch.__version__)) class Regularization(torch.nn.Module): def __init__(self,model,weight_decay,p=2): ''' :param model 模型 :param weight_decay:正则化参数 :param p: 范数计算中的幂指数值，默认求2范数, 当p=0为L2正则化,p=1为L1正则化 ''' super(Regularization, self).__init__() if weight_decay 0: reg_loss=Regularization(model, weight_decay, p=2).to(device) else: print("no regularization") criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay # train batch_train_data=... batch_train_label=... out = model(batch_train_data) # loss and regularization loss = criterion(input=out, target=batch_train_label) if weight_decay > 0: loss = loss + reg_loss(model) total_loss = loss.item() # backprop optimizer.zero_grad()#清除当前所有的累积梯度 total_loss.backward() optimizer.step() 学习率衰减

torch.optim.lr_scheduler

根据迭代次数

当epoch每过stop_size时,学习率都变为初始学习率的gamma倍

optimizer = optim.SGD(params=model.parameters(), lr=0.05) # lr_scheduler.StepLR() # Assuming optimizer uses lr = 0.05 for all groups # lr = 0.05 if epoch < 30 # lr = 0.005 if 30 Tensor or None model = alexnet(pretrained=True).to(device) outputs = [] def hook (module,input,output): outputs.append(output) print len(outputs) handle = model.features[0].register_backward_hook(hook)

注：还可以通过定义一个提取特征的类，甚至是重构成各层独立相同模型将问题转化成第一种

计算模型参数数量 def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) 自定义Operation(Function)

class torch.autograd.Function能为微分操作定义公式并记录操作历史，在Tensor上执行的每个操作都会创建一个新的函数对象，它执行计算，并记录它发生的事件。历史记录以函数的DAG形式保留，边表示数据依赖关系（输入< - 输出）。然后，当backward被调用时，通过调用每个Function对象的backward()方法并将返回的梯度传递给下一个Function，以拓扑顺序处理图。

一般来说，用户与函数交互的唯一方法是通过创建子类并定义新的操作。这是拓展torch.autograd的推荐方法。

创建子类的注意事项子类必须重写forward()，backward()方法，且为静态方法，定义时需加@staticmethod装饰器。 forward()必须接受一个contextctx作为第一个参数，context可用于存储可在反向传播期间检索的张量。后面可接任意个数的参数(张量或者其他类型)。 backward()必须接受一个contextctx作为第一个参数，context可用于检索前向传播期间保存的张量。其参数是forward()给定输出的梯度，数量与forward()返回值个数一致。其返回值是forward()对应输入的梯度，数量与forward()的输入个数一致。使用class_name.apply(arg)的方式即可调用该操作示例1：自定义ReLU激活函数 class MyReLU(torch.autograd.Function): """ We can implement our own custom autograd Functions by subclassing torch.autograd.Function and implementing the forward and backward passes which operate on Tensors. """ @staticmethod def forward(ctx, input): """ In the forward pass we receive a Tensor containing the input and return a Tensor containing the output. ctx is a context object that can be used to stash information for backward computation. You can cache arbitrary objects for use in the backward pass using the ctx.save_for_backward method. """ ctx.save_for_backward(input) return input.clamp(min=0) @staticmethod def backward(ctx, grad_output): """ In the backward pass we receive a Tensor containing the gradient of the loss with respect to the output, and we need to compute the gradient of the loss with respect to the input. """ input, = ctx.saved_tensors grad_input = grad_output.clone() grad_input[input < 0] = 0 return grad_input 示例2：自定义OHEMHingeLoss损失函数 # from the https://github.com/yjxiong/action-detection class OHEMHingeLoss(torch.autograd.Function): """ This class is the core implementation for the completeness loss in paper. It compute class-wise hinge loss and performs online hard negative mining (OHEM). """ @staticmethod def forward(ctx, pred, labels, is_positive, ohem_ratio, group_size): n_sample = pred.size()[0] assert n_sample == len(labels), "mismatch between sample size and label size" losses = torch.zeros(n_sample) slopes = torch.zeros(n_sample) for i in range(n_sample): losses[i] = max(0, 1 - is_positive * pred[i, labels[i] - 1]) slopes[i] = -is_positive if losses[i] != 0 else 0 losses = losses.view(-1, group_size).contiguous() sorted_losses, indices = torch.sort(losses, dim=1, descending=True) keep_num = int(group_size * ohem_ratio) loss = torch.zeros(1).cuda() for i in range(losses.size(0)): loss += sorted_losses[i, :keep_num].sum() ctx.loss_ind = indices[:, :keep_num] ctx.labels = labels ctx.slopes = slopes ctx.shape = pred.size() ctx.group_size = group_size ctx.num_group = losses.size(0) return loss @staticmethod def backward(ctx, grad_output): labels = ctx.labels slopes = ctx.slopes grad_in = torch.zeros(ctx.shape) for group in range(ctx.num_group): for idx in ctx.loss_ind[group]: loc = idx + group * ctx.group_size grad_in[loc, labels[loc] - 1] = slopes[loc] * grad_output.data[0] return torch.autograd.Variable(grad_in.cuda()), None, None, None, None

【本文地址】

公司简介

联系我们

今日新闻

点击排行

实验室常用的仪器、试剂和: 说到实验室常用到的东西，主要就分为仪器、试剂和耗

不用再找了，全球10大实验: 01、赛默飞世尔科技（热电）Thermo Fisher Scientif

三代水柜的量产巅峰T-72坦: 作者：寞寒最近，西边闹腾挺大，本来小寞以为忙完这

通风柜跟实验室通风系统有: 说到通风柜跟实验室通风，不少人都纠结二者到底是不

集消毒杀菌、烘干收纳为一: 厨房是家里细菌较多的地方，潮湿的环境、没有完全密

实验室设备之全钢实验台如: 全钢实验台是实验室家具中较为重要的家具之一，很多

图片新闻

实验室药品柜的特性有哪些: 实验室药品柜是实验室家具的重要组成部分之一，主要

小学科学实验中有哪些教学: 计算机计算器一般打孔器打气筒仪器车显微镜

实验室各种仪器原理动图讲: 1.紫外分光光谱UV分析原理：吸收紫外光能量，引起分

高中化学常见仪器及实验装: 1、可加热仪器：2、计量仪器：（1）仪器A的名称：量

微生物操作主要设备和器具: 今天盘点一下微生物操作主要设备和器具，别嫌我啰嗦

浅谈通风柜使用基本常识: 　众所周知，通风柜功能中最主要的就是排气功能。在

1PyTorch 实现中的一些常用技巧

1PyTorch 实现中的一些常用技巧

今日新闻

点击排行

推荐新闻

图片新闻

专题文章